Skip to content

Raw vectors data layer in HNSW + move to base class [MOD-7496] #523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 19, 2024

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Aug 12, 2024

Describe the changes in the pull request

Use the new RawDataContainer interface in HNSW, currently with an explicit DataBlocksContainer implementation, and move the abstract vectors member to the base class.

This includes:

  • Moving the relevant serialization part (save/restore) of the vectors in HNSW into the DataBlocksContainer responsibility, as we should not access the blocks directly anymore (should be applied for the graph data blocks later on as well).

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.97%. Comparing base (1381f64) to head (f4ffbbb).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #523      +/-   ##
==========================================
+ Coverage   96.93%   96.97%   +0.04%     
==========================================
  Files         100      100              
  Lines        5287     5295       +8     
==========================================
+ Hits         5125     5135      +10     
+ Misses        162      160       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alonre24 alonre24 requested a review from GuyAv46 November 11, 2024 16:40
void DataBlocksContainer::saveBlocks(std::ostream &output) const {
// Save number of blocks
unsigned int num_blocks = this->numBlocks();
Serializer::writeBinaryPOD(output, num_blocks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider only saving the vectors without the metadata about the number of blocks and their sizes, so we can load them into other containers (or to different block sizes)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means we don't need to add serialization to the container class, keeping it on the algorithm level

@alonre24 alonre24 requested a review from GuyAv46 November 26, 2024 08:28
@alonre24 alonre24 changed the title Raw vectors data layer in HNSW + move to base class Raw vectors data layer in HNSW + move to base class [MOD-7496] Dec 16, 2024
@alonre24 alonre24 added this pull request to the merge queue Dec 19, 2024
Merged via the queue into main with commit 1e55ba7 Dec 19, 2024
20 checks passed
@alonre24 alonre24 deleted the raw_data_layer_hnsw branch December 19, 2024 14:13
github-actions bot pushed a commit that referenced this pull request Dec 19, 2024
* use data blocks container in HNSW + adjust tests

* Move vectors raw data to base class

* fix for test serialization

* fix test to be safe with parallel insertions

* try lock only for get next results in batch iterator in bindings

* todo

* remove alignment from base index class + move vector allocation and destruction to base

* Add serializer version that does not persist blocks (to be used in the future for other raw data layers)

* Add test to cover old serialization version + small improvements

* fix test

* fix test for cov

(cherry picked from commit 1e55ba7)
Copy link

Successfully created backport PR for 8.0:

github-merge-queue bot pushed a commit that referenced this pull request Dec 19, 2024
…574)

Raw vectors data layer in HNSW + move to base class [MOD-7496] (#523)

* use data blocks container in HNSW + adjust tests

* Move vectors raw data to base class

* fix for test serialization

* fix test to be safe with parallel insertions

* try lock only for get next results in batch iterator in bindings

* todo

* remove alignment from base index class + move vector allocation and destruction to base

* Add serializer version that does not persist blocks (to be used in the future for other raw data layers)

* Add test to cover old serialization version + small improvements

* fix test

* fix test for cov

(cherry picked from commit 1e55ba7)

Co-authored-by: alonre24 <alon.reshef@redis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants